# ARM 프로세서 개요

Hancheol Cho

## **ARM Processor Family**



### ARM Architecture 분류

- A The *Application* profile defines a VMSA based microprocessor architecture. It is targeted at high performance processors, capable of running full feature operating systems. It supports the ARM and Thumb instruction sets.
- R The *Real-time* profile defines a PMSA based microprocessor architecture. It is targeted at systems that require deterministic timing and low interrupt latency. It supports the ARM and Thumb instruction sets.
- The *Microcontroller* profile provides low-latency interrupt processing accessible directly from high-level programming languages. It has a different exception handling model to the other profiles, implements a variant of the PMSA, and supports a variant of the Thumb instruction set only.

| Profile   | Architecture | Instruction Set | Processor        |
|-----------|--------------|-----------------|------------------|
| A-Profile | ARMv7-A      | A32, T32        | Cortex-A Series  |
| R-Profile | ARMv7-R      | A32, T32        | Cortex-R Series  |
| M Duefile | ARMv7-M      | T32             | Cortex-M Series  |
| M-Profile | ARMV6-M      | T32             | Cortex-M0 Series |

### **Cortex-M Processor**

| Cortex-M3                 | Corte               | x-M4                   | Corte                         | k- <b>M</b> 7 | Cortex-M33                                        |                        |
|---------------------------|---------------------|------------------------|-------------------------------|---------------|---------------------------------------------------|------------------------|
| Performance<br>efficiency | Mainst<br>control a |                        | Maxim<br>perform<br>control a | ance,         | Flexibility, control<br>and DSP with<br>TrustZone | Performance efficiency |
| Corte                     | ex-M0               | Cortex-                | M0+                           |               | Cortex-M23                                        |                        |
| low p                     | ower<br>DesignStart | Highest er<br>efficien |                               |               | TrustZone in smallest area, lowest power          | Lowest power & area    |
| SC                        | 000                 | SC30                   | 0                             |               |                                                   |                        |
|                           | ed area,<br>mpering | Performa<br>anti-tamp  | 77.74                         |               |                                                   | SecurCore              |

|                 | Cortex-M0         | Cortex-M0+        | Cortex-M3         | Cortex-M4           | Cortex-M7                            |
|-----------------|-------------------|-------------------|-------------------|---------------------|--------------------------------------|
| Instruction set | ARMv6-M           | ARMv6-M           | ARMv7-M           | ARMv7-M             | ARMv7-M                              |
| architecture    | Thumb,<br>Thumb-2 | Thumb,<br>Thumb-2 | Thumb,<br>Thumb-2 | Thumb, Thumb-<br>2, | Thumb, Thumb-2,<br>DSP, FP (1. SP or |
|                 |                   |                   |                   | DSP, FP (SP)        | 2. SP+DP)                            |

### **Cortex-M Instruction Set support**



### Programmer's model



### **Real Time OS?**

# **Exception Vector**

● 스택포인터 초기값을 지정 가능 ○ 스타트업 코드도 C언어로 작성 가능

| Exception ARMv6-M   |                               | ARMv7-M                                             |
|---------------------|-------------------------------|-----------------------------------------------------|
| 255                 |                               |                                                     |
| 47<br> <br>17<br>16 | Device Specific<br>Interrupts | Device Specific<br>Interrupts                       |
| 15                  | SysTick                       | SysTick                                             |
| 14                  | PendSV                        | PendSV                                              |
| 13                  | Not used                      | Not used                                            |
| 12                  | Not used                      | Debug Monitor                                       |
| 11                  | SVC                           | SVC                                                 |
| 10<br>9<br>8<br>7   | Not used                      | Not used                                            |
| 6                   |                               | Usage Fault                                         |
| 5                   |                               | SysTick PendSV Not used Debug Monitor SVC  Not used |
| 4                   |                               | MemManage (fault)                                   |
| 3                   | HardFault                     | HardFault                                           |
| 2                   | NMI                           | NMI                                                 |
| 1 0                 |                               |                                                     |

| Vector Table                     | Vector address<br>(initial) |
|----------------------------------|-----------------------------|
| Interrupt#239 vector 1           | 0x000003FC                  |
| Interrupt#31 vector 1            | 0x000000BC                  |
| Interrupt#1 vector 1             | 0x00000044                  |
| Interrupt#0 vector 1             | 0x00000040                  |
| SysTick vector 1                 | 0x0000003C                  |
| PendSV vector 1                  | 0x00000038                  |
| Not used                         | 0x00000034                  |
| Debug Monitor vector 1           | 0x00000030                  |
| SVC vector 1                     | 0x0000002C                  |
| Not used                         | 0x00000028                  |
| Not used                         | 0x00000024                  |
| Not used                         | 0x00000020                  |
| SecureFault (ARMv8-M Mainline) 1 | 0x0000001C                  |
| Usage Fault vector 1             | 0x00000018                  |
| Bus Fault vector 1               | 0x00000014                  |
| MemManage vector 1               | 0x00000010                  |
| HardFault vector 1               | 0x0000000C                  |
| NMI vector 1                     | 0x00000008                  |
| Reset vector 1                   | 0x00000004                  |
| MSP initial value                | 0x00000000                  |

# 성능 비교

#### • 데이터 처리 속도

|            | Dhrystone DMIPS/MHz<br>(v2.1) – official | Dhrystone DMIPS/MHz<br>(v2.1) – full optimization | Coremark/MHz (v1.0) |
|------------|------------------------------------------|---------------------------------------------------|---------------------|
| Cortex-M0  | 0.84                                     | 1.21                                              | 2.33                |
| Cortex-M0+ | 0.94                                     | 1.31                                              | 2.42                |
| Cortex-M3  | 1.25                                     | 1.89                                              | 3.32                |
| Cortex-M4  | 1.25                                     | 1.95                                              | 3.40                |
| Cortex-M7  | 2.14                                     | 2.55                                              | 5.01                |

#### Interrupt Latency

|            | Interrupt latency (number of clock cycles) |
|------------|--------------------------------------------|
| Cortex-M0  | 16                                         |
| Cortex-M0+ | 15                                         |
| Cortex-M3  | 12                                         |
| Cortex-M4  | 12                                         |
| Cortex-M7  | Typically 12, worst case 14                |

### **Interrupt Latency**

Interrupt Latency?



• 기존 마이크로 컨트롤러와 비교





### DSP?

#### DSP 장점

- 연산모듈이 8개 (최대 1Clock에 8개의 명령어 실행 가능)
- Very-Long-Instruction-Word (VLIW)
- 데이터 버스 최대 256bit



소프트 파이프라인을 통한 병렬 실행

```
MVKL L1DCC, A0
                   ; \
|| MVKL L1PCC, BO
                   ; | Generate L1DCC pointer in A0
 MVKH L1DCC, A0
                  ; | and L1PCC pointer in B0
|| MVKH L1PCC, B0
                  ; \ OPER encoding for 'freeze'
|| MVK 1b, A1
|| MVK 1b, B1
                 ; / in both Al and Bl.
  STW A1, *A0
                 ; Write to L1DCC.OPER
II STW B1, *B0
                 ; Write to L1PCC.OPER
 LDW *A0, A1
                 ; Get old freeze state into Al from L1DCC
|| LDW *B0, B1
                   ; Get old freeze state into B1 from L1PCC
  NOP 4
 ; At this point, L1D and L1P are frozen.
 ; The old value of L1DCC.OPER is in bit 16 of Al.
 ; The old value of L1PCC.OPER is in bit 16 of B1.
```

### DSP?

- DSP 장점
  - o DMA 기능이 강력함
    - DMA 기능만으로도 일부 이미지 처리가 가능함

- DSP 단점
  - 인터럽트 발생시 명령어가 길어서 연산속도에 영향을 많음
    - 최적화시 기본적으로 인터럽트가 Disable됨으로 인터럽트 사용시에는 최적화 옵션 사용시 주의가 필요함
    - ARM 프로세서와 듀얼로 많이 사용
  - 최적화에 따른 속도 편차가 심함
    - 연산모듈은 8개이나 명령어 종류에 따른 동시 실행이 안되는 경우가 있음
    - 컴파일러 옵션만으로는 최적화의 한계가 있음으로 TI에서 제공하는 최적화 라이브러리 사용 권장
  - 캐시에 대한 영향이 크다
    - 명령어도 길고 데이터도 크기때문에 캐시 메모리에서 실행시와 외부메모리에서 실행시 속도 편차가 큼

### **OpenCR (Open-source Control Module for ROS)**

- STM32F746ZGT6 216Mhz, Cortex-M7, 1MB Flash, 320KB SRAM
- 아두이노 우노 핀 헤더
- 아두이노 IDE 개발환경 지원
- 다이나믹셀/올로/UART/CAN 인터페이스
- 배터리 입력 및 전원 출력(12V/5V/3.3V)



### **OpenCR (Open-source Control Module for ROS)**

- 하드웨어 자료
  - https://github.com/ROBOTIS-GIT/OpenCR-Hardware
- 펌웨어 자료
  - https://github.com/ROBOTIS-GIT/OpenCR